Characterizing Model Performance in the Feature Space

نویسندگان

Stephen D. Bay

Michael J. Pazzani

چکیده

A fundamental problem in machine learning is understanding the conditions for which a learning algorithm works well. Understanding an algorithm's strengths and weaknesses and being able to compare two algorithms with each other are necessary for designers to develop (or select) learning algorithms for a speci c problem. Generally, one can attempt to analyze and understand algorithms either theoretically or empirically. Theoretical analyses of machine learning algorithms have usually resulted in weak performance guarantees that are not much use to a practitioner. Algorithms are typically proven to be asymptotically consistent (i.e., will achieve the Bayes optimal error rate given enough training examples) or that the algorithm can be used to PAC (Probably Approximately Correct) learn a given concept (Valiant, 1984). Another approach is to analyze average case behavior under speci c distributional assumptions, such as learning m-of-n concepts (Langley & Sage, 1999). Although these analyses are useful in understanding the general behavior of an algorithm, they are unable to provide guidance to the designer in the form of speci c predictions of an algorithm's performance with a given problem. Thus most researchers and practitioners resort to empirical evaluation to understand the interaction between learning algorithms and a domain. Unfortunately, most evaluation methods give very little information to the designer. For example, the most common method of empirically evaluating a classi er is to examine its error, or more generally loss, and many comparisons of algorithms use only this metric. Loss can easily be estimated by using a test set or cross-validation. However because loss is a single number, it reveals little about the algorithm except gross performance on the domain. Other researchers have tried to learn when a classi cation algorithm is appropriate for a problem domain based on characteristics of the data set. The basic idea is to use properties such as the number of features, number of classes, or the number of instances to learn by inspection or through automated analysis when an algorithm is appropriate. For example, Aha (1992) used a rule learner to automatically derive results such as the following.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Characterizing imageability in Gajar houses of Tabriz

Aims: From Lufor’s perspective, space is constructed based on spatial operation, recreation and the space in which recreation takes place. This approach is a descriptive view of the relationship between space from a materialistic point of view and its dominant ideas with its dwellers. In this perspective, humans integrate distinct and indistinct data of space and create map-like mental images f...

متن کامل

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...

متن کامل

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

متن کامل

Determining Effective Features for Face Detection Using a Hybrid Feature Approach

Detecting faces in cluttered backgrounds and real world has remained as an unsolved problem yet. In this paper, by using composition of some kind of independent features and one of the most common appearance based approaches, and multilayered perceptron (MLP) neural networks, not only some questions have been answered, but also the designed system achieved better performance rather than the pre...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Characterizing Model Performance in the Feature Space

نویسندگان

چکیده

منابع مشابه

Characterizing imageability in Gajar houses of Tabriz

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

Determining Effective Features for Face Detection Using a Hybrid Feature Approach

عنوان ژورنال:

اشتراک گذاری